Goto

Collaborating Authors

 observation function


A Architectures, Hyper-parameters and Algorithms

Neural Information Processing Systems

Our approach, named ORDER, uses a three-step training process. In the next parts of this section, we'll explain the methods, structures, and settings we use in each of After that, we'll talk about how we set up and carried out our experiments. In this section, we'll break down the design of the state encoder, how we decided on the best We used a grid search strategy to find the optimal hyper-parameters for our experiments. This allowed each observation dimension to match up with a state factor. We summarize the training process in Algorithm 1.







From CAD to POMDP: Probabilistic Planning for Robotic Disassembly of End-of-Life Products

Baumgärtner, Jan, Hansjosten, Malte, Hald, David, Hauptmannl, Adrian, Puchta, Alexander, Fleischer, Jürgen

arXiv.org Artificial Intelligence

Abstract-- T o support the circular economy, robotic systems must not only assemble new products but also disassemble end-of-life (EOL) ones for reuse, recycling, or safe disposal. Existing approaches to disassembly sequence planning often assume deterministic and fully observable product models, yet real EOL products frequently deviate from their initial designs due to wear, corrosion, or undocumented repairs. We argue that disassembly should therefore be formulated as a Partially Observable Markov Decision Process (POMDP), which naturally captures uncertainty about the product's internal state. We present a mathematical formulation of disassembly as a POMDP, in which hidden variables represent uncertain structural or physical properties. Building on this formulation, we propose a task and motion planning framework that automatically derives specific POMDP models from CAD data, robot capabilities, and inspection results. T o obtain tractable policies, we approximate this formulation with a reinforcement-learning approach that operates on stochastic action outcomes informed by inspection priors, while a Bayesian filter continuously maintains beliefs over latent EOL conditions during execution. Using three products on two robotic systems, we demonstrate that this probabilistic planning framework outperforms deterministic baselines in terms of average disassembly time and variance, generalizes across different robot setups, and successfully adapts to deviations from the CAD model, such as missing or stuck parts. I. INTRODUCTION Modern industrial production still follows a linear model of make-use-dispose, accelerating the depletion of natural resources on our planet.




Multi-Environment POMDPs: Discrete Model Uncertainty Under Partial Observability

Bovy, Eline M., Probine, Caleb, Suilen, Marnix, Topcu, Ufuk, Jansen, Nils

arXiv.org Artificial Intelligence

Multi-environment POMDPs (ME-POMDPs) extend standard POMDPs with discrete model uncertainty. ME-POMDPs represent a finite set of POMDPs that share the same state, action, and observation spaces, but may arbitrarily vary in their transition, observation, and reward models. Such models arise, for instance, when multiple domain experts disagree on how to model a problem. The goal is to find a single policy that is robust against any choice of POMDP within the set, i.e., a policy that maximizes the worst-case reward across all POMDPs. We generalize and expand on existing work in the following way. First, we show that ME-POMDPs can be generalized to POMDPs with sets of initial beliefs, which we call adversarial-belief POMDPs (AB-POMDPs). Second, we show that any arbitrary ME-POMDP can be reduced to a ME-POMDP that only varies in its transition and reward functions or only in its observation and reward functions, while preserving (optimal) policies. We then devise exact and approximate (point-based) algorithms to compute robust policies for AB-POMDPs, and thus ME-POMDPs. We demonstrate that we can compute policies for standard POMDP benchmarks extended to the multi-environment setting.